Dataset statistics
| Number of variables | 9 |
|---|---|
| Number of observations | 26648 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.0 MiB |
| Average record size in memory | 40.0 B |
Variable types
| NUM | 8 |
|---|---|
| CAT | 1 |
departure_time is highly correlated with departure_time_day | High correlation |
departure_time_day is highly correlated with departure_time | High correlation |
flight_path has 5780 (21.7%) zeros | Zeros |
airline has 3897 (14.6%) zeros | Zeros |
departure_time_day has 1934 (7.3%) zeros | Zeros |
booking_day has 3776 (14.2%) zeros | Zeros |
departure_day has 6957 (26.1%) zeros | Zeros |
number_of_stops has 5301 (19.9%) zeros | Zeros |
Reproduction
| Analysis started | 2020-10-07 12:55:39.234692 |
|---|---|
| Analysis finished | 2020-10-07 12:56:19.759433 |
| Duration | 40.52 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
df_index
Real number (ℝ≥0)
| Distinct | 5730 |
|---|---|
| Distinct (%) | 21.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2387.683166 |
|---|---|
| Minimum | 0 |
| Maximum | 5783 |
| Zeros | 7 |
| Zeros (%) | < 0.1% |
| Memory size | 208.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 230 |
| Q1 | 1147 |
| median | 2284.5 |
| Q3 | 3431 |
| 95-th percentile | 5120 |
| Maximum | 5783 |
| Range | 5783 |
| Interquartile range (IQR) | 2284 |
Descriptive statistics
| Standard deviation | 1487.008623 |
|---|---|
| Coefficient of variation (CV) | 0.622783058 |
| Kurtosis | -0.8135551969 |
| Mean | 2387.683166 |
| Median Absolute Deviation (MAD) | 1141.5 |
| Skewness | 0.3230022456 |
| Sum | 63626981 |
| Variance | 2211194.646 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 2023 | 7 | < 0.1% | |
| 3321 | 7 | < 0.1% | |
| 2552 | 7 | < 0.1% | |
| 505 | 7 | < 0.1% | |
| 537 | 7 | < 0.1% | |
| 2616 | 7 | < 0.1% | |
| 2632 | 7 | < 0.1% | |
| 585 | 7 | < 0.1% | |
| 601 | 7 | < 0.1% | |
| 713 | 7 | < 0.1% | |
| Other values (5720) | 26578 | 99.7% |
| Value | Count | Frequency (%) | |
| 0 | 7 | < 0.1% | |
| 1 | 7 | < 0.1% | |
| 2 | 7 | < 0.1% | |
| 3 | 7 | < 0.1% | |
| 4 | 7 | < 0.1% |
| Value | Count | Frequency (%) | |
| 5783 | 1 | < 0.1% | |
| 5782 | 1 | < 0.1% | |
| 5781 | 1 | < 0.1% | |
| 5780 | 1 | < 0.1% | |
| 5779 | 1 | < 0.1% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.888622035 |
|---|---|
| Minimum | 0 |
| Maximum | 6 |
| Zeros | 5780 |
| Zeros (%) | 21.7% |
| Memory size | 26.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 3 |
| Q3 | 5 |
| 95-th percentile | 6 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 2.200323654 |
|---|---|
| Coefficient of variation (CV) | 0.7617208576 |
| Kurtosis | -1.450638224 |
| Mean | 2.888622035 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.03060599217 |
| Sum | 76976 |
| Variance | 4.841424183 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) | |
| 0 | 5780 | 21.7% | |
| 3 | 5184 | 19.5% | |
| 6 | 4535 | 17.0% | |
| 1 | 4360 | 16.4% | |
| 5 | 4092 | 15.4% | |
| 4 | 2000 | 7.5% | |
| 2 | 697 | 2.6% |
| Value | Count | Frequency (%) | |
| 0 | 5780 | 21.7% | |
| 1 | 4360 | 16.4% | |
| 2 | 697 | 2.6% | |
| 3 | 5184 | 19.5% | |
| 4 | 2000 | 7.5% |
| Value | Count | Frequency (%) | |
| 6 | 4535 | 17.0% | |
| 5 | 4092 | 15.4% | |
| 4 | 2000 | 7.5% | |
| 3 | 5184 | 19.5% | |
| 2 | 697 | 2.6% |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.881829781 |
|---|---|
| Minimum | 0 |
| Maximum | 5 |
| Zeros | 3897 |
| Zeros (%) | 14.6% |
| Memory size | 26.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| median | 3 |
| Q3 | 4 |
| 95-th percentile | 5 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.542275041 |
|---|---|
| Coefficient of variation (CV) | 0.5351721503 |
| Kurtosis | -0.5175576667 |
| Mean | 2.881829781 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -0.6340597304 |
| Sum | 76795 |
| Variance | 2.378612301 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) | |
| 3 | 10194 | 38.3% | |
| 4 | 5831 | 21.9% | |
| 0 | 3897 | 14.6% | |
| 5 | 3703 | 13.9% | |
| 1 | 1672 | 6.3% | |
| 2 | 1351 | 5.1% |
| Value | Count | Frequency (%) | |
| 0 | 3897 | 14.6% | |
| 1 | 1672 | 6.3% | |
| 2 | 1351 | 5.1% | |
| 3 | 10194 | 38.3% | |
| 4 | 5831 | 21.9% |
| Value | Count | Frequency (%) | |
| 5 | 3703 | 13.9% | |
| 4 | 5831 | 21.9% | |
| 3 | 10194 | 38.3% | |
| 2 | 1351 | 5.1% | |
| 1 | 1672 | 6.3% |
| Distinct | 28 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.71206094 |
|---|---|
| Minimum | 0 |
| Maximum | 27 |
| Zeros | 1934 |
| Zeros (%) | 7.3% |
| Memory size | 26.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 6 |
| median | 14 |
| Q3 | 17 |
| 95-th percentile | 20 |
| Maximum | 27 |
| Range | 27 |
| Interquartile range (IQR) | 11 |
Descriptive statistics
| Standard deviation | 6.630470153 |
|---|---|
| Coefficient of variation (CV) | 0.5661232626 |
| Kurtosis | -1.107932071 |
| Mean | 11.71206094 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | -0.4045555234 |
| Sum | 312103 |
| Variance | 43.96313445 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=28)
| Value | Count | Frequency (%) | |
| 14 | 3903 | 14.6% | |
| 20 | 1935 | 7.3% | |
| 0 | 1934 | 7.3% | |
| 18 | 1884 | 7.1% | |
| 19 | 1828 | 6.9% | |
| 16 | 1799 | 6.8% | |
| 17 | 1686 | 6.3% | |
| 15 | 1533 | 5.8% | |
| 2 | 1069 | 4.0% | |
| 7 | 1009 | 3.8% | |
| Other values (18) | 8068 | 30.3% |
| Value | Count | Frequency (%) | |
| 0 | 1934 | 7.3% | |
| 1 | 769 | 2.9% | |
| 2 | 1069 | 4.0% | |
| 3 | 896 | 3.4% | |
| 4 | 962 | 3.6% |
| Value | Count | Frequency (%) | |
| 27 | 52 | 0.2% | |
| 26 | 50 | 0.2% | |
| 25 | 49 | 0.2% | |
| 24 | 46 | 0.2% | |
| 23 | 57 | 0.2% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.090138097 |
|---|---|
| Minimum | 0 |
| Maximum | 6 |
| Zeros | 3776 |
| Zeros (%) | 14.2% |
| Memory size | 208.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 3 |
| Q3 | 5 |
| 95-th percentile | 6 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 2.013497147 |
|---|---|
| Coefficient of variation (CV) | 0.6515880793 |
| Kurtosis | -1.256603876 |
| Mean | 3.090138097 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.07836505984 |
| Sum | 82346 |
| Variance | 4.054170762 |
| Monotocity | Increasing |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) | |
| 6 | 4054 | 15.2% | |
| 4 | 4044 | 15.2% | |
| 5 | 4037 | 15.1% | |
| 0 | 3776 | 14.2% | |
| 3 | 3705 | 13.9% | |
| 1 | 3518 | 13.2% | |
| 2 | 3514 | 13.2% |
| Value | Count | Frequency (%) | |
| 0 | 3776 | 14.2% | |
| 1 | 3518 | 13.2% | |
| 2 | 3514 | 13.2% | |
| 3 | 3705 | 13.9% | |
| 4 | 4044 | 15.2% |
| Value | Count | Frequency (%) | |
| 6 | 4054 | 15.2% | |
| 5 | 4037 | 15.1% | |
| 4 | 4044 | 15.2% | |
| 3 | 3705 | 13.9% | |
| 2 | 3514 | 13.2% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.650517863 |
|---|---|
| Minimum | 0 |
| Maximum | 6 |
| Zeros | 6957 |
| Zeros (%) | 26.1% |
| Memory size | 104.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 3 |
| Q3 | 5 |
| 95-th percentile | 6 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 2.145478968 |
|---|---|
| Coefficient of variation (CV) | 0.8094565212 |
| Kurtosis | -1.3641934 |
| Mean | 2.650517863 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.1575318788 |
| Sum | 70631 |
| Variance | 4.603080004 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) | |
| 0 | 6957 | 26.1% | |
| 6 | 3553 | 13.3% | |
| 2 | 3454 | 13.0% | |
| 4 | 3445 | 12.9% | |
| 5 | 3280 | 12.3% | |
| 3 | 3133 | 11.8% | |
| 1 | 2826 | 10.6% |
| Value | Count | Frequency (%) | |
| 0 | 6957 | 26.1% | |
| 1 | 2826 | 10.6% | |
| 2 | 3454 | 13.0% | |
| 3 | 3133 | 11.8% | |
| 4 | 3445 | 12.9% |
| Value | Count | Frequency (%) | |
| 6 | 3553 | 13.3% | |
| 5 | 3280 | 12.3% | |
| 4 | 3445 | 12.9% | |
| 3 | 3133 | 11.8% | |
| 2 | 3454 | 13.0% |
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 26.0 KiB |
| 2 | |
|---|---|
| 0 | |
| 1 | |
| 3 | 418 |
| Value | Count | Frequency (%) | |
| 2 | 14568 | 54.7% | |
| 0 | 7556 | 28.4% | |
| 1 | 4106 | 15.4% | |
| 3 | 418 | 1.6% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
flight_cost
Real number (ℝ≥0)
| Distinct | 1060 |
|---|---|
| Distinct (%) | 4.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5151.78974 |
|---|---|
| Minimum | 2540 |
| Maximum | 20674 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 208.2 KiB |
Quantile statistics
| Minimum | 2540 |
|---|---|
| 5-th percentile | 3323 |
| Q1 | 3797 |
| median | 4680 |
| Q3 | 5910 |
| 95-th percentile | 8731 |
| Maximum | 20674 |
| Range | 18134 |
| Interquartile range (IQR) | 2113 |
Descriptive statistics
| Standard deviation | 1869.544629 |
|---|---|
| Coefficient of variation (CV) | 0.3628922613 |
| Kurtosis | 5.198739453 |
| Mean | 5151.78974 |
| Median Absolute Deviation (MAD) | 883 |
| Skewness | 1.906065332 |
| Sum | 137284893 |
| Variance | 3495197.118 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 3797 | 1770 | 6.6% | |
| 3746 | 1433 | 5.4% | |
| 3955 | 1232 | 4.6% | |
| 3481 | 1119 | 4.2% | |
| 3271 | 890 | 3.3% | |
| 5132 | 872 | 3.3% | |
| 4006 | 715 | 2.7% | |
| 4712 | 635 | 2.4% | |
| 3956 | 560 | 2.1% | |
| 4922 | 424 | 1.6% | |
| Other values (1050) | 16998 | 63.8% |
| Value | Count | Frequency (%) | |
| 2540 | 3 | < 0.1% | |
| 2955 | 53 | 0.2% | |
| 2956 | 64 | 0.2% | |
| 2957 | 202 | 0.8% | |
| 3061 | 7 | < 0.1% |
| Value | Count | Frequency (%) | |
| 20674 | 4 | < 0.1% | |
| 20666 | 1 | < 0.1% | |
| 18866 | 1 | < 0.1% | |
| 18486 | 1 | < 0.1% | |
| 18445 | 1 | < 0.1% |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.9828129691 |
|---|---|
| Minimum | 0 |
| Maximum | 5 |
| Zeros | 5301 |
| Zeros (%) | 19.9% |
| Memory size | 208.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.6768069427 |
|---|---|
| Coefficient of variation (CV) | 0.6886426654 |
| Kurtosis | 1.941358959 |
| Mean | 0.9828129691 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 0.8233686836 |
| Sum | 26190 |
| Variance | 0.4580676376 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) | |
| 1 | 17470 | 65.6% | |
| 0 | 5301 | 19.9% | |
| 2 | 2979 | 11.2% | |
| 3 | 831 | 3.1% | |
| 4 | 66 | 0.2% | |
| 5 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 5301 | 19.9% | |
| 1 | 17470 | 65.6% | |
| 2 | 2979 | 11.2% | |
| 3 | 831 | 3.1% | |
| 4 | 66 | 0.2% |
| Value | Count | Frequency (%) | |
| 5 | 1 | < 0.1% | |
| 4 | 66 | 0.2% | |
| 3 | 831 | 3.1% | |
| 2 | 2979 | 11.2% | |
| 1 | 17470 | 65.6% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| df_index | flight_path | airline | departure_time_day | booking_day | departure_day | departure_time | flight_cost | number_of_stops | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 3 | 1 | 14 | 0 | 0 | 2 | 3500 | 1 |
| 1 | 1 | 3 | 1 | 14 | 0 | 0 | 2 | 3500 | 1 |
| 2 | 2 | 3 | 1 | 21 | 0 | 0 | 3 | 3868 | 1 |
| 3 | 3 | 3 | 2 | 7 | 0 | 0 | 1 | 3500 | 1 |
| 4 | 4 | 3 | 2 | 7 | 0 | 0 | 1 | 3500 | 0 |
| 5 | 5 | 3 | 2 | 14 | 0 | 0 | 2 | 3868 | 0 |
| 6 | 6 | 3 | 5 | 14 | 0 | 0 | 2 | 3500 | 0 |
| 7 | 7 | 3 | 5 | 0 | 0 | 0 | 0 | 3500 | 0 |
| 8 | 8 | 3 | 4 | 0 | 0 | 0 | 0 | 3868 | 0 |
| 9 | 9 | 3 | 5 | 7 | 0 | 0 | 1 | 3501 | 0 |
Last rows
| df_index | flight_path | airline | departure_time_day | booking_day | departure_day | departure_time | flight_cost | number_of_stops | |
|---|---|---|---|---|---|---|---|---|---|
| 26638 | 5773 | 5 | 0 | 0 | 6 | 0 | 0 | 9435 | 1 |
| 26639 | 5774 | 5 | 4 | 7 | 6 | 0 | 1 | 11687 | 1 |
| 26640 | 5775 | 5 | 4 | 7 | 6 | 0 | 1 | 4740 | 1 |
| 26641 | 5777 | 5 | 3 | 0 | 6 | 0 | 0 | 9435 | 1 |
| 26642 | 5778 | 5 | 3 | 14 | 6 | 0 | 2 | 11687 | 1 |
| 26643 | 5779 | 5 | 3 | 14 | 6 | 0 | 2 | 4740 | 1 |
| 26644 | 5780 | 5 | 0 | 7 | 6 | 0 | 1 | 4740 | 1 |
| 26645 | 5781 | 5 | 5 | 14 | 6 | 0 | 2 | 8912 | 1 |
| 26646 | 5782 | 5 | 5 | 14 | 6 | 0 | 2 | 11176 | 1 |
| 26647 | 5783 | 5 | 0 | 21 | 6 | 0 | 3 | 4740 | 3 |